Can Subcategorisation Probabilities Help a Statistical Parser
نویسندگان
چکیده
Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a widecoverage statistical grammar and parser for English and subcategorisation frequencies acquired from ten million words of text which shows that this information can significantly improve parse accuracy 1 .
منابع مشابه
Three Generative, Lexicalised Mode l s for Statistical Parsing
In this paper we first propose a new statistical parsing model, which is a generative model of lexicalised context-free grammar. We then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. Results on Wall Street Journal text show that the parser performs at 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over (Collins 96).
متن کاملThree Generative, Lexicalised Models for Statistical Parsing
In this paper we first propose a new statistical parsing model, which is a generative model of lexicalised context-free grammar. We then extend the model to include a probabilistic treatment of both subcategorisation and wh-movement. Results on Wall Street Journal text show that the parser performs at 88.1/87.5% constituent precision/recall, an average improvement of 2.3% over (Collins 96).
متن کاملCan Subcategorization Help a Statistical Dependency Parser?
Today there is a relatively large body of work on automatic acquisition of lexicosyntactical preferences (subcategorization) from corpora. Various techniques have been developed that not only produce machinereadable subcategorization dictionaries but also they are capable of weighing the various subcategorization frames probabilistically. Clearly there should be a potential to use such weighted...
متن کاملLearning Subcategorisation Information to Model a Grammar with “Co-restrictions”
This paper describes two different tasks involving the notion of subcategorisation in NLP. First, it presents a specific strategy to acquire both nominal and verbal subcategorisation from text corpora. More precisely, we describe an unsupervised method for extracting syntactic and semantic subcategorisation from partially parsed texts. The second task concerns the usage of subcategorisation inf...
متن کاملBootstrapping Statistical Processing Into A Rule-Based Natural Language Parser
This paper describes a "bootstrapping" method which uses a broad-coverage, rule-based parser to compute probabilities while parsing an untagged corpus of NL text, and which then incorporates those probabilities into the processing of the same parser as it analyzes new text. Results are reported which show that this method can significantly improve the speed and accuracy of the parser without re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره cmp-lg/9806013 شماره
صفحات -
تاریخ انتشار 1998